Skip to content

feat(tools/modal): add Function + Volume capabilities#55

Closed
KillerQueen-Z wants to merge 12 commits into
mainfrom
feat/modal-functions-volumes
Closed

feat(tools/modal): add Function + Volume capabilities#55
KillerQueen-Z wants to merge 12 commits into
mainfrom
feat/modal-functions-volumes

Conversation

@KillerQueen-Z
Copy link
Copy Markdown
Collaborator

Summary

Adds 6 new Modal capabilities for long-running GPU workflows. Pairs with BlockRunAI/blockrun#16 — gateway must be merged + deployed first, this PR is no-op until then.

New capabilities

Tool Purpose
ModalDeployFunction Register a long-running Python function (custom pip, GPU, up to 24h). Charges max_timeout × hourly rate upfront.
ModalRunFunction Trigger a deployed function. Returns run_id; poll for result.
ModalGetFunctionStatus Poll a run for status/result/error.
ModalCreateVolume Create persistent storage. $0.20/GB-month, 1mo prepaid.
ModalListVolumes List caller's volumes.
ModalDeleteVolume Delete a volume (no refund).

Use case

Closes the gap between the existing 24h-capped Sandbox path (`ModalCreate`) and real ML workflows: fine-tuning, batch jobs with checkpoints, multi-day data pipelines.

Pricing

v1 charges upfront at deploy time using the same hourly tiers as long-task sandbox ($0.10/h CPU → $8/h H100). NO REFUND on early termination — over-allocating `timeout` wastes USDC.

Smart-rebate / actual-usage settlement is Phase B, documented in the gateway team's Notion checklist.

Test plan

  • Build passes (`npm run build`)
  • After gateway is deployed: e2e test from Franklin CLI:
    • Deploy a trivial CPU function
    • Run it with input
    • Status returns expected result
    • Create + list + delete a volume

No breaking changes

All existing ModalCreate/Exec/Status/Terminate capabilities remain. New capabilities are additive in the `modalCapabilities` array.

KillerQueen-Z and others added 12 commits April 30, 2026 02:03
This is the first feature/vscode-extension* branch built on top of
origin/main directly rather than a stack of cherry-picks. The previous
branch had drifted ~500 commits behind main as upstream shipped:

- v3.10.0 detached background tasks (Detach tool + franklin task CLI)
- v3.9.0  Skills system (SKILL.md loader, registry, bundled grill)
- v3.9.1  status bar shows chain + default spend cap raised to $2
- v3.9.2  Kimi K2.6 alignment
- v3.9.3  /model picker trim 28 → 23
- v3.9.4  roleplayed JSON tool-calls + V4 Flash / Omni metadata
- v3.9.5  Nemotron Omni prose stripping + gpt-image-2 size pin
- v3.9.6  reasoning-model TTFB defaults + long-task guidance
- v3.8.40 i2i timeout (#19) + configurable spend cap (#20) — already
          our PRs, now confirmed merged
- v3.8.41 smart timeout recovery (#26)
- v3.8.42 default spend cap $0.25 → $1.00 (#28)
- v3.8.43 proxy: per-request timeout + payment-aware fallback (#31)
- #34     SKILL.md skills loader
- #35     first-class Wallet tool

Cherry-picking each onto the old branch would have produced a wall of
no-op-content / phantom conflicts (the cherry-picks didn't share commit
hashes with main even though their content matched). Instead this
branch starts from origin/main and re-applies only the bits that are
genuinely extension-specific:

- vscode-extension/  (entire directory — webview app, build, README,
                     mascot images, VSIX assets)
- src/api/vscode-session.ts  (new file: extension-host session helper)
- src/commands/config.ts  (added default-image-model + default-video-
                          model keys; exported saveConfig for the
                          settings popover; kept main's $1 default
                          comment + max-turn-spend-usd key)
- src/agent/streaming-executor.ts  (added ImageGen / VideoGen case to
                                   inputPreview so timeline shows model)
- src/commands/doctor.ts  (export runChecks so vscode-session can
                          re-export it as runDoctorChecks)
- package.json  (./vscode-session export — alongside ./wallet, etc.)

Bumps vscode-extension to 0.5.0 (was 0.4.5). Also adds vscode-extension/
*.vsix to .gitignore — packaged builds shouldn't be tracked.

Old feature/vscode-extension preserved at backup/vscode-extension-pre-sync.
Mirror of upstream PR #36 (fix/savings-includes-media-cost). The
"Saved vs Opus" panel hero would show negative dollar amounts as
soon as a user spent meaningfully on ImageGen / VideoGen, e.g.

    $-8.79
    You spent $20.4896 instead of $11.70

Root cause: getStatsSummary() compared an Opus-token baseline (chat
only — image/video log inputTokens=0/outputTokens=0) against
totalCostUsd (chat + media combined), so once media spend exceeded
the chat-vs-Opus delta the difference flipped negative.

Fix: split byModel into chatOnlyCost (rows with tokens) and mediaCost
(rows without). opusCost on the display side now equals
opusChatCost + mediaCost so "you spent X instead of Y" stays
apples-to-apples; saved = max(0, opusChatCost - chatOnlyCost) is
the chat-side delta only and is clamped non-negative.

Bumps vscode-extension to 0.5.1; updates README changelog.
…ion-v0.5

# Conflicts:
#	src/panel/html.ts
#	src/stats/tracker.ts
…+ history rename/delete + Detach cwd fix + insights category breakdown + wallet QR + GPU sandbox panel + session import + rate-limit toast

This branch preserves the in-progress parallel image/video generation
feature (concurrent: 'batch' + askUser merge + walletReservation +
runBatchPool) which the v0.5 extension branch will revert until it's
been validated end-to-end.

Companion features intentionally kept on v0.5:
- Modal sandbox tools (use walletReservation but only from Modal,
  not media gen)
- Detach cwd-resolution fix
- History rename/delete
- Wallet QR popover
- Tasks + GPU Sandboxes overlay panels
- Session import (Claude Code / Codex)
- Rate-limit friendly toast
- Insights By Category breakdown

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… feature/parallel-media-gen

The cherry-pick brought in everything from the WIP branch including the
in-progress parallel image/video pipeline (concurrent: 'batch', batch
preflight, askUser mutex, walletReservation, batch-concurrency config,
settings UI). That pipeline hasn't been validated end-to-end yet, so it's
deferred to feature/parallel-media-gen until ready.

What's KEPT on v0.5 (validated, ship-ready):
- Modal sandbox tools (ModalCreate/Exec/Status/Terminate) + GPU sandbox panel
- Detach cwd-resolution bug fix + 4-strategy fallback
- History rename / delete with inline confirm UI
- Wallet QR popover (chain-aware EIP-681 / Solana Pay)
- Tasks overlay + badge
- Session import (Claude Code / Codex)
- Rate-limit friendly toast
- Insights By Category breakdown (chat/media/sandbox)
- Image gen: response_format strip for gpt-image-* family + verbose error
  diagnostics + async polling for slow models
- Default-image-model / default-video-model config consultation
- Defensive sanitizeOutgoingMessages in llm.ts
- Modal tool exempt from 3-failure auto-disable in tool-guard
- Settings popover refresh + obsolete max-turn-spend-usd auto-strip

What's REMOVED (now only on feature/parallel-media-gen):
- 'batch' concurrent mode in CapabilityHandler
- BATCH_CONCURRENCY pool + runBatchPool in streaming-executor
- preflightBatch + askUserChain mutex + batchPreApproved Set
- skipAskUser in ExecutionScope
- walletReservation usage from imagegen/videogen
- batch-concurrency config key
- 'Parallel image / video' setting input
- Random suffix on default output paths

WalletReservation infrastructure stays (Modal tools use it).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings in 120 commits since the last sync (1106ef5), including:

Vision (this is the big-ticket fix):
- PR #53 + sibling-sites patch: preserve image blocks in
  budgetToolResults / ageOldToolResults / dedup; client-side sharp
  resize on Read (1.9MB PNG -> 117KB).

LLM / gateway:
- Gemini Pro non-streaming requests
- 429 Retry-After honoring
- Stream char sanitizer (U+2502 / U+2500 -> ASCII)
- Gateway error text doesn't kill session
- Classifier separates payment_rejected from payment_required

Stats / cost:
- franklin stats reads cost_log.jsonl (SDK ledger)
- recorded-vs-wallet gap detection
- image/video/modal latency measured at 5 callsites
- agent-loop measures real LLM latency (was hardcoded 0)

Loop / agent:
- same-tool warn-once + signature-based stuck detector
- switch model when intent declared without tool_use
- --resume preserves cost / token totals

Wallet / Swap:
- Base0xGaslessSwap (user pays no ETH for gas)
- Base 0x V2 + Permit2
- Jupiter Ultra swap with on-chain referral fee

Prediction Market: full rewrite of wallet-analysis triplet,
smartMoney replacement, walletProfile addresses fix.

Trading: TickerToId expansion (TON etc), dual-listing notice for
tokenized equities, live-swap session cap.

Modal: latency tracking, logger migration.

ImageGen: HTTP 202 queue handling, latency, error surfacing.

Conflict resolution:
- tools/modal.ts, tools/index.ts, tasks/spawn.ts,
  session/storage.ts, agent/tool-guard.ts: take main.
- session/storage.ts: ported v0.5's deleteSession +
  renameSession + SessionMeta.title back on top of main.
- tools/imagegen.ts: hand-merged. Kept v0.5's broader async
  detection (handles HTTP 202 + status fields + non-JSON body
  fallback), kept main's bits-based error surfacing + exported
  pollImageJob singleton, deduped poll helpers.
- tools/videogen.ts: re-added missing videoGenCapability
  singleton export so test fixtures keep importing it.

Tests: 368/368 pass. Build clean.
Bumps the VS Code extension to 0.6.1 and rebundles out/extension.cjs
on top of the v0.5↔main merge. The user-visible win is the vision
fix: image paste / drop / file Read no longer over-charges
($0.50/call → bounded by client-side sharp resize) and no longer
hallucinates descriptions (image blocks now survive the optimizer
pipeline end to end).

The bundled franklin core jumps from v3.10.x territory to v3.15.90,
picking up the 120 commits enumerated in the merge message — the
extension inherits all of them with no extension-side code change
required (UI surfaces of the new prediction / Base / Modal / etc.
tools land automatically through the agent's tool inventory).
…image guard

Brings in d370a38 + 5003b67. The two patches do exactly what we were
about to design ourselves (and did some research on, comparing
opencode / Aider / Continue / OpenHands / Cline patterns):

- New src/router/vision.ts: curated vision-model allowlist,
  basename-anchored image-path regex, family-aware sibling picker.
- routeRequest / routeRequestAsync / resolveTierToModel take a new
  needsVision flag. Auto routing now walks the tier chain for the
  first vision-capable model when an image is in play; escalates to
  COMPLEX (Opus) if the whole tier is text-only.
- Manual-mode guard in agent/loop.ts: detects image refs in user
  input on turn 1, swaps the user's text-only pick to the closest
  family vision sibling for ONE turn with a visible warning. Next
  turn's baseModel recovery restores the user's pick.
- proxy/server.ts mirrors the same logic on the Anthropic proxy
  path (scans messages[] for image / image_url / input_image parts
  plus paths in text parts).
- 5 new tests; 373/373 pass total.

Better than the design we discussed: their swap-with-warning
single-turn approach beats the silent-strip pattern that opencode /
Continue / OpenHands all use, by avoiding the "user can't tell what
model is running" failure mode of silent model substitution.
…ector

Brings in d8803cd + 4ddf2f1. Adds:
- Per-(tool, category) classification of tool failures
- Anomaly detector that flags tools with above-baseline failure rates
- 'franklin doctor --anomaly' surfaces the report

Conflict: package-lock.json (regenerated via npm install).
Tests: 381/381 pass.
Pairs with BlockRunAI/blockrun rfc/modal-full-chain (gateway PR).

New capabilities:
  - ModalDeployFunction: register a long-running Python function on Modal
    (custom pip deps, GPU choice, up to 24h timeout). Charges max_timeout
    × hourly rate upfront — same model as long-task sandbox.
  - ModalRunFunction: trigger a deployed function. Returns run_id; poll
    for result. Compute already paid at deploy.
  - ModalGetFunctionStatus: poll a run for status/result/error.
  - ModalCreateVolume: create persistent storage. \$0.20/GB-month, 1mo
    prepaid. Up to 200GB per wallet.
  - ModalListVolumes: list caller's volumes.
  - ModalDeleteVolume: delete a volume (no refund).

These close the gap between the 24h-capped Sandbox path and the long-
running ML workflows agents need (fine-tuning, batch jobs, persistent
checkpoints). Smart-rebate / actual-usage settlement is Phase B
(documented separately in the gateway team's Notion checklist) — v1
charges upfront and does not refund early-finish.

Wire-level design: see RFC in BlockRunAI/blockrun (rfc/modal-full-chain).
Gateway must be deployed first; this client PR is no-op until then.
Franklin's main hadn't run CI since 2026-04-21; some package.json change
landed without an accompanying lockfile bump, so 'npm ci' fails on:

  npm error Missing: utf-8-validate@5.0.10 from lock file

Regenerated cleanly via 'rm -rf node_modules package-lock.json && npm install'.
Lockfile is now in sync with current package.json. This commit is
unrelated to the Modal capabilities being added in this PR — included
solely to unblock CI on this branch (and incidentally on main too).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant